Search results for "Matrix multiplication"
showing 10 items of 20 documents
FeatherCNN: Fast Inference Computation with TensorGEMM on ARM Architectures
2020
Deep Learning is ubiquitous in a wide field of applications ranging from research to industry. In comparison to time-consuming iterative training of convolutional neural networks (CNNs), inference is a relatively lightweight operation making it amenable to execution on mobile devices. Nevertheless, lower latency and higher computation efficiency are crucial to allow for complex models and prolonged battery life. Addressing the aforementioned challenges, we propose FeatherCNN – a fast inference library for ARM CPUs – targeting the performance ceiling of mobile devices. FeatherCNN employs three key techniques: 1) A highly efficient TensorGEMM (generalized matrix multiplication) routine is app…
Reverse-safe data structures for text indexing
2021
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optim…
Generalized centro-invertible matrices with applications
2014
Centro-invertible matrices are introduced by R.S. Wikramaratna in 2008. For an involutory matrix R, we define the generalized centro-invertible matrices with respect to R to be those matrices A such that RAR = A^−1. We apply these matrices to a problem in modular arithmetic. Specifically, algorithms for image blurring/deblurring are designed by means of generalized centro-invertible matrices. In addition, if R1 and R2 are n × n involutory matrices, then there is a simple bijection between the set of all centro-invertible matrices with respect to R1 and the set with respect to R2.
Fast Matrix Multiplication
2015
Until a few years ago, the fastest known matrix multiplication algorithm, due to Coppersmith and Winograd (1990), ran in time O(n2.3755). Recently, a surge of activity by Stothers, Vassilevska-Williams, and Le~Gall has led to an improved algorithm running in time O(n2.3729). These algorithms are obtained by analyzing higher and higher tensor powers of a certain identity of Coppersmith and Winograd. We show that this exact approach cannot result in an algorithm with running time O(n2.3725), and identify a wide class of variants of this approach which cannot result in an algorithm with running time $O(n^{2.3078}); in particular, this approach cannot prove the conjecture that for every e > 0, …
An Scalable matrix computing unit architecture for FPGA and SCUMO user design interface
2019
High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations’ performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication.…
Versatile Direct and Transpose Matrix Multiplication with Chained Operations: An Optimized Architecture Using Circulant Matrices
2016
With growing demands in real-time control, classification or prediction, algorithms become more complex while low power and small size devices are required. Matrix multiplication (direct or transpose) is common for such computation algorithms. In numerous algorithms, it is also required to perform matrix multiplication repeatedly, where the result of a multiplication is further multiplied again. This work describes a versatile computation procedure and architecture: one of the matrices is stored in internal memory in its circulant form, then, a sequence of direct or transpose multiplications can be performed without timing penalty. The architecture proposes a RAM-ALU block for each matrix c…
Reverse-Safe Text Indexing
2021
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matc…
A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States
2013
This is a partly non-technical introduction to selected topics on tensor network methods, based on several lectures and introductory seminars given on the subject. It should be a good place for newcomers to get familiarized with some of the key ideas in the field, specially regarding the numerics. After a very general introduction we motivate the concept of tensor network and provide several examples. We then move on to explain some basics about Matrix Product States (MPS) and Projected Entangled Pair States (PEPS). Selected details on some of the associated numerical methods for 1d and 2d quantum lattice systems are also discussed.
Fast Approximated Discriminative Common Vectors Using Rank-One SVD Updates
2013
An efficient incremental approach to the discriminative common vector (DCV) method for dimensionality reduction and classification is presented. The proposal consists of a rank-one update along with an adaptive restriction on the rank of the null space which leads to an approximate but convenient solution. The algorithm can be implemented very efficiently in terms of matrix operations and space complexity, which enables its use in large-scale dynamic application domains. Deep comparative experimentation using publicly available high dimensional image datasets has been carried out in order to properly assess the proposed algorithm against several recent incremental formulations.
A recurrence-free variant of strassen’s algorithm on hypercube
1995
In this paper a non-recursive Strassen’s matrix multiplication algorithm is presented. This new algorithm is suitable to run on parallel environments. Two computational schemes have been worked out exploiting different parallel approaches on hypercube architecture. A comparative analysis is reported. The experiments have been carried out on an nCUBE-2 supercomputer, housed at CNUCE in Pisa, supporting the Express parallel operating system. © 1995, Taylor & Francis Group, LLC. All rights reserved.